Automated Population of Cyc: Extracting Information about Named-entities from the Web
نویسندگان
چکیده
Populating the Cyc Knowledge Base (KB) has been a manual process until very recently. However, there is currently enough knowledge in Cyc for it to be feasible to attempt to acquire additional knowledge autonomously. This paper describes a system that can collect and validate formally represented, fully-integrated knowledge from the Web or any other electronically available text corpus, about various entities of interest (e.g. famous people, organizations, etc.). Experimental results and lessons learned from their analysis are presented.
منابع مشابه
Presenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملTowards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore
Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from exte...
متن کاملClassMate: A System for Automated Event Extraction from Course Websites
Websites contain a huge amount of time-critical data in highly unstructured and heterogeneous form. Information Extraction systems can extract relevant entities and relationships from these sites, and identify, classify and categorize them. In this paper, we present ClassMate, a complete system for extracting key course-related events from university course websites. ClassMate pipelines web dat...
متن کاملAutomatic population of knowledge bases with multimodal data about named entities
Knowledge bases are of great importance for Web search, recommendations, and many Information Retrieval tasks. However, maintaining them for not so popular entities is often a bottleneck. Typically, such entities have limited textual coverage and only a few ontological facts. Moreover, these entities are not well populated with multimodal data, such as images, videos, or audio recordings. The g...
متن کاملGathering and Managing Facts for Intelligence Analysis
This paper presents a novel method, based on the Cyc Knowledge Base and Inference Engine, of gathering, organizing and sharing information about entities of interest (be they people, organizations , events or some other type of entity). The formal representations used in the Fact Sheets allow users to easily share information with others , run automated queries against the information , and all...
متن کامل